Leveraging Compounds to Improve Noun Phrase Translation from Chinese and German

نویسندگان

  • Xiao Pu
  • Laura Mascarell
  • Andrei Popescu-Belis
  • Mark Fishel
  • Ngoc-Quang Luong
  • Martin Volk
چکیده

This paper presents a method to improve the translation of polysemous nouns, leveraging on their previous occurrence as the head of a compound noun phrase. First, the occurrences are identified through pattern matching rules, which detect occurrences of an XY compound followed closely by a potentially coreferent occurrence of Y , such as “Mooncakes . . . cakes . . .”. Second, two strategies are proposed to improve the translation of the second occurrence of Y : re-using the cached translation of Y from the XY compound, or post-editing the translation of Y using the head of the translation of XY . Experiments are performed on Chineseto-English and German-to-French statistical machine translation, with about 250 occurrences of XY/Y , from the WIT3 and Text+Berg corpora. The results and their analysis suggest that while the overall BLEU scores increase only slightly, the translations of the targeted polysemous nouns are significantly improved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature-Rich Statistical Translation of Noun Phrases

We define noun phrase translation as a subtask of machine translation. This enables us to build a dedicated noun phrase translation subsystem that improves over the currently best general statistical machine translation methods by incorporating special modeling and special features. We achieved 65.5% translation accuracy in a German-English translation task vs. 53.2% with IBM Model 4.

متن کامل

Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

Flat noun phrase structure was, up until recently, the standard in annotation for the Penn Treebanks. With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. It is proposed that changes to the noun phrase dependency ...

متن کامل

Bilingually-Constrained Recursive Neural Networks with Syntactic Constraints for Hierarchical Translation Model

Hierarchical phrase-based translation models have advanced statistical machine translation (SMT). Because such models can improve leveraging of syntactic information, two types of methods (leveraging source parsing and leveraging shallow parsing) are applied to introduce syntactic constraints into translation models. In this paper, we propose a bilingually-constrained recursive neural network (...

متن کامل

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages

The best way to improve a statistical machine translation system is to identify concrete problems causing translation errors and address them. Many of these problems are related to the characteristics of the involved languages and differences between them. This work explores the main obstacles for statistical machine translation systems involving two morphologically rich and under-resourced lan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015